Skip to content

Fix ContextVar propagation for ASGI-mounted servers with tasks#2843

Merged
chrisguidry merged 13 commits intorelease/2.xfrom
debug-task-lifecycle
Jan 12, 2026
Merged

Fix ContextVar propagation for ASGI-mounted servers with tasks#2843
chrisguidry merged 13 commits intorelease/2.xfrom
debug-task-lifecycle

Conversation

@chrisguidry
Copy link
Copy Markdown
Collaborator

Summary

Fixes background tasks failing with "Background tasks require a running FastMCP server context" when FastMCP is mounted to another ASGI application (FastAPI, Starlette, etc.) or deployed to serverless environments (Lambda, Cloud Run).

Root cause: ContextVars set during lifespan don't propagate to request handlers in ASGI environments because they run in sibling async contexts, not parent-child.

Fix: Context.__aenter__ now sets _current_docket and _current_worker from server instance attributes at request time, ensuring they're available regardless of async context hierarchy.

Changes

  • context.py: Set docket/worker ContextVars from server._docket/server._worker in __aenter__
  • server.py: Store _worker on server instance (was already storing _docket)
  • pyproject.toml: Bump pydocket to >=0.16.6 (includes Redis ACL and py-key-value fixes)

Closes #2671

🤖 Generated with Claude Code

chrisguidry and others added 12 commits January 9, 2026 11:27
When FastMCP runs with uvicorn, the lifespan is entered twice:
1. FastMCP's outer context (during http_app setup)
2. Starlette's ASGI lifespan (which request handlers inherit from)

The second call was skipping ContextVar setup because _lifespan_result_set
was already True. This caused _current_docket.get() to return None in
request handlers even though server._docket was correctly set.

Fix: Always set ContextVars when entering _lifespan_manager, using the
already-initialized values from self._docket and self._worker.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
ContextVars set during lifespan don't propagate to request handlers
in Lambda (works fine locally). As a workaround, fall back to using
server._docket when the ContextVar returns None.

This is a Lambda-specific issue - possibly related to how Lambda Web
Adapter or Lambda's asyncio runtime handles context propagation.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Instead of relying on ContextVar propagation from lifespan (which fails
in Lambda), set _current_docket and _current_worker when entering a
Context for each request. This ensures user dependencies like
CurrentDocket() and CurrentWorker() work in all environments.

The values come from server._docket and server._worker which are always
available after lifespan initialization.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Adds detailed logging to Context.__aenter__ and __aexit__ to track:
- When Context is entered/exited
- Values of server._docket and server._worker
- ContextVar values before and after setting
- Token values for debugging reset issues

This will help diagnose why ContextVars might not propagate in Lambda.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When Redis operations fail, log the full traceback to help diagnose
ACL and permission issues in production environments.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Tracing where the Redis ACL error occurs - the initial Redis writes
succeed but error happens somewhere after.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
pydocket 0.16.5 fixes an issue where worker_group_name was passed as
a KEY instead of ARGV in Lua scripts, causing ACL failures when Redis
users are restricted to key patterns.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
… time

Remove redundant ContextVar handling:
- _lifespan_manager no longer re-sets ContextVars in early-return branch
- Handler fallback logic removed (no more `if docket is None: docket = server._docket`)

The authoritative place for request-context ContextVars is now Context.__aenter__,
which sets _current_docket and _current_worker from server instance attributes.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
All three handlers (tool, prompt, resource) now have identical patterns:
- Debug logging for docket access, Redis writes, docket.add, subscriptions
- Try/except with traceback logging around Redis and docket operations
- Consistent error messages with instance_id

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Removes all the verbose debug logging added during diagnosis while
preserving the essential fix: Context.__aenter__ sets _current_docket
and _current_worker from server instance attributes. This ensures
ContextVars work in ASGI environments where lifespan and request
handlers run in sibling async contexts.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@chrisguidry chrisguidry requested a review from jlowin January 12, 2026 15:24
@marvin-context-protocol marvin-context-protocol Bot added bug Something isn't working. Reports of errors, unexpected behavior, or broken functionality. server Related to FastMCP server implementation or server-side functionality. http Related to HTTP transport, networking, or web server functionality. labels Jan 12, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jan 12, 2026

Walkthrough

The changes extend the per-request context manager to capture and propagate docket and worker context via ContextVars, enabling dependency injection for these resources across async context boundaries. A new internal worker reference is added to the FastMCP server instance and populated during the docket lifecycle. A guard is also introduced to the lifespan manager to prevent re-entrance. Error messages are updated to reference the broader server context requirement, and logging in subscription handlers is adjusted from warning to error level.

Possibly related PRs

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: fixing ContextVar propagation for ASGI-mounted servers with tasks, which is the core issue addressed in the PR.
Description check ✅ Passed The PR description covers the root cause, the fix, and references the relevant issue #2671. However, the checklist items are incomplete and unchecked despite the PR being ready for review.
Linked Issues check ✅ Passed The PR successfully addresses issue #2671 by implementing the ContextVar fix to enable tasks in ASGI-mounted scenarios. The context.py and server.py changes set _current_docket and _current_worker at request time from server instance attributes, ensuring availability regardless of async context hierarchy.
Out of Scope Changes check ✅ Passed All changes are directly related to fixing the ContextVar propagation issue. The modifications to context.py, server.py, and the pydocket version bump in pyproject.toml are all necessary and scoped to the stated objective.
Docstring Coverage ✅ Passed Docstring coverage is 81.82% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/fastmcp/server/tasks/subscriptions.py (1)

72-73: Consider using logger.exception() to preserve stack traces.

Switching from warning to error is appropriate for subscription failures. However, removing the traceback (previously exc_info=True) loses valuable debugging information. Using logger.exception() logs at ERROR level while automatically including the traceback.

♻️ Suggested fix
     except Exception as e:
-        logger.error(f"subscribe_to_task_updates failed for {task_id}: {e}")
+        logger.exception(f"subscribe_to_task_updates failed for {task_id}: {e}")
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between df5765c and c127dd9.

⛔ Files ignored due to path filters (2)
  • pyproject.toml is excluded by none and included by none
  • uv.lock is excluded by !**/*.lock and included by none
📒 Files selected for processing (4)
  • src/fastmcp/server/context.py
  • src/fastmcp/server/server.py
  • src/fastmcp/server/tasks/handlers.py
  • src/fastmcp/server/tasks/subscriptions.py
🧰 Additional context used
📓 Path-based instructions (1)
src/fastmcp/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/fastmcp/**/*.py: Python ≥ 3.10 with full type annotations required
Prioritize readable, understandable code - clarity over cleverness. Avoid obfuscated or confusing patterns even if shorter
Follow existing patterns and maintain consistency in code implementation
Be intentional about re-exports - don't blindly re-export everything to parent namespaces. Core types defining a module's purpose should be exported. Specialized features can live in submodules. Only re-export to fastmcp.* for most fundamental types
Never use bare except - be specific with exception types

Files:

  • src/fastmcp/server/tasks/subscriptions.py
  • src/fastmcp/server/context.py
  • src/fastmcp/server/tasks/handlers.py
  • src/fastmcp/server/server.py
🧬 Code graph analysis (2)
src/fastmcp/server/tasks/handlers.py (2)
src/fastmcp/server/server.py (1)
  • docket (379-384)
src/fastmcp/server/dependencies.py (1)
  • message (405-406)
src/fastmcp/server/server.py (1)
src/fastmcp/cli/tasks.py (1)
  • worker (61-110)
🪛 Ruff (0.14.10)
src/fastmcp/server/tasks/subscriptions.py

73-73: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Run tests: Python 3.13 on ubuntu-latest
  • GitHub Check: Run tests: Python 3.10 on ubuntu-latest
  • GitHub Check: Run tests: Python 3.10 on windows-latest
  • GitHub Check: Run tests with lowest-direct dependencies
🔇 Additional comments (8)
src/fastmcp/server/tasks/handlers.py (3)

59-67: LGTM!

The updated comment accurately documents the new behavior where Docket is retrieved from a ContextVar set at request time by Context.__aenter__. The error message change to "running FastMCP server context" is more descriptive and aligns with the broader context propagation fix.


171-179: Consistent with other handlers.

The same comment and error message pattern applied here maintains consistency across handle_tool_as_task, handle_prompt_as_task, and handle_resource_as_task.


281-289: Consistent error messaging across all task handlers.

All three task handlers now share the same pattern for retrieving Docket from ContextVar and reporting the same user-friendly error message when the server context is unavailable.

src/fastmcp/server/server.py (3)

200-202: LGTM - Proper initialization of cross-context attributes.

Initializing _worker = None alongside _docket = None maintains symmetry and enables Context.__aenter__ to check and propagate these values at request time.


471-486: Worker lifecycle properly managed on server instance.

Storing the worker reference during the docket lifespan and clearing it on completion enables cross-context access via server._worker. This pairs well with the existing server._docket pattern.


567-571: Key fix for ASGI-mounted server context propagation.

This guard correctly short-circuits when the lifespan has already run (e.g., when http_app() is mounted into FastAPI/Starlette). The comment accurately explains that Context.__aenter__ will set the ContextVars at request time, solving the sibling async context problem.

src/fastmcp/server/context.py (2)

188-205: Core fix for ContextVar propagation - well implemented.

This is the authoritative fix for the ASGI-mounted server issue. By setting _current_docket and _current_worker from server._docket and server._worker at request entry time, ContextVars are properly available regardless of async context hierarchy (child vs. sibling contexts).

The conditional checks (if server._docket is not None) correctly handle:

  • Servers with tasks enabled (docket/worker available)
  • Servers without tasks (docket/worker are None)
  • Mounted servers that skip their own docket lifecycle

208-228: Proper token cleanup in aexit.

The LIFO reset order (worker → docket → server) correctly mirrors the set order. Using hasattr() guards before reset handles cases where tokens weren't set (e.g., docket disabled). Deleting attributes after reset prevents stale references.

@marvin-context-protocol
Copy link
Copy Markdown
Contributor

Test Failure Analysis

Summary: Test (line 572 in ) is timing out after 5 seconds on Windows (Python 3.10).

Root Cause: The PR changes to remove the background notification flusher ( task) that was running in a task group. This background task was responsible for periodically flushing notifications during long-running operations.

The timeout is happening because:

  1. The test spawns stdio subprocesses (Python servers) that need to communicate with the client
  2. The removed background flusher was handling periodic notification flushes during these long operations
  3. Without it, the test may be hanging waiting for notifications or responses that aren't being properly flushed

Detailed Analysis:

In src/fastmcp/server/context.py:

  • Before: __aenter__ created a task group and started _periodic_flush in the background
  • After: The task group and background flusher are removed entirely
  • The PR only flushes notifications on __aexit__, which may be too late for multi-server stdio scenarios
Relevant Code Changes
# REMOVED:
self._exit_stack = AsyncExitStack()
await self._exit_stack.__aenter__()
tg = await self._exit_stack.enter_async_context(anyio.create_task_group())
self._cancel_scope = anyio.CancelScope()
tg.start_soon(self._periodic_flush)

The _periodic_flush method was handling background notification flushing throughout the context lifetime, not just at exit.

Suggested Solution:

The ContextVar fix (setting _current_docket and _current_worker from server instance) is correct for the ASGI issue. However, you need to restore some form of periodic flushing for stdio transport scenarios. Options:

  1. Keep the background flusher but make it conditional - only start it when not in ASGI/serverless environments
  2. Manual flush points - Add explicit flush calls at key points in the stdio transport/client code
  3. Different approach - Consider if notifications need to be flushed differently for stdio vs HTTP transports

Related Files:

Why Windows-Specific?

The test file has a marker at line 42:

pytestmark = pytest.mark.skipif(
    sys.platform.startswith("win32"),
    reason="Windows has process lifecycle issues with stdio subprocesses",
)

However, this marker should skip ALL tests in the file on Windows, but the test is still running. This suggests the skip marker isn't working as expected, OR this specific failure is exposing a real Windows-specific timing issue with stdio subprocesses.

@marvin-context-protocol
Copy link
Copy Markdown
Contributor

Test Failure Analysis

Summary: Test test_multi_client_with_transforms (line 572 in tests/test_mcp_config.py) is timing out after 5 seconds on Windows (Python 3.10).

Root Cause: The PR changes Context.__aenter__ to remove the background notification flusher (_periodic_flush task) that was running in a task group. This background task was responsible for periodically flushing notifications during long-running operations.

The timeout is happening because:

  1. The test spawns stdio subprocesses (Python servers) that need to communicate with the client
  2. The removed background flusher was handling periodic notification flushes during these long operations
  3. Without it, the test may be hanging waiting for notifications or responses that aren't being properly flushed

Detailed Analysis:

In src/fastmcp/server/context.py:

  • Before: __aenter__ created a task group and started _periodic_flush in the background
  • After: The task group and background flusher are removed entirely
  • The PR only flushes notifications on __aexit__, which may be too late for multi-server stdio scenarios
Relevant Code Changes
# REMOVED:
self._exit_stack = AsyncExitStack()
await self._exit_stack.__aenter__()
tg = await self._exit_stack.enter_async_context(anyio.create_task_group())
self._cancel_scope = anyio.CancelScope()
tg.start_soon(self._periodic_flush)

The _periodic_flush method was handling background notification flushing throughout the context lifetime, not just at exit.

Suggested Solution:

The ContextVar fix (setting _current_docket and _current_worker from server instance) is correct for the ASGI issue. However, you need to restore some form of periodic flushing for stdio transport scenarios. Options:

  1. Keep the background flusher but make it conditional - only start it when not in ASGI/serverless environments
  2. Manual flush points - Add explicit flush calls at key points in the stdio transport/client code
  3. Different approach - Consider if notifications need to be flushed differently for stdio vs HTTP transports

Related Files:

Why Windows-Specific?

The test file has a marker at line 42:

pytestmark = pytest.mark.skipif(
    sys.platform.startswith("win32"),
    reason="Windows has process lifecycle issues with stdio subprocesses",
)

However, this marker should skip ALL tests in the file on Windows, but the test is still running. This suggests the skip marker isn't working as expected, OR this specific failure is exposing a real Windows-specific timing issue with stdio subprocesses.

@marvin-context-protocol
Copy link
Copy Markdown
Contributor

Update: Additional Finding

I discovered something important - the test should not be running on Windows at all!

The test file tests/test_mcp_config.py has this marker at the top (line 42):

pytestmark = pytest.mark.skipif(
    sys.platform.startswith("win32"),
    reason="Windows has process lifecycle issues with stdio subprocesses",
)

This should skip ALL tests in that file on Windows, but the tests are clearly running (they're passing up until the timeout).

This means:

  1. Either the skipif condition is not working correctly
  2. OR sys.platform is not returning "win32" on the Windows CI runner

Let me investigate the actual platform string on Windows GitHub Actions runners...

Actually, looking at Python docs: on Windows, sys.platform returns "win32" even on 64-bit Windows. So the condition should work.

Hypothesis: The skipif might be evaluated at collection time with the wrong environment, or there's been a change in how pytest handles module-level pytestmark in recent versions.

Immediate Fix:
Rather than debugging the marker issue, you could add a @pytest.mark.skipif(sys.platform.startswith("win32"), ...) decorator directly to the failing test test_multi_client_with_transforms as a workaround. But the root cause (background flusher removal) still needs to be addressed.

@chrisguidry chrisguidry merged commit 9e86dbc into release/2.x Jan 12, 2026
8 checks passed
@chrisguidry chrisguidry deleted the debug-task-lifecycle branch January 12, 2026 15:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working. Reports of errors, unexpected behavior, or broken functionality. http Related to HTTP transport, networking, or web server functionality. server Related to FastMCP server implementation or server-side functionality.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant